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The  workshop  brought  together  researchers  in  computational  vision  and 
psychophysics  to  discuss  ways  of  conceptualizing  and  modeling  problems 
in  visual  perception.  Such  a  conceptualization  requires  common  frameworks 
for  formulating  problems  in  perception.  Workshop  participants  considered 
what  formal  tools  and  structures  these  frameworks  should  provide  in 
order  to  be  most  useful  for  the  study  of  human  vision.  Several  recently 
proposed  frameworks  based  on  the  formulation  of  Bayesian,  probabl istic, 
inference  served  as  the  focal  point  for  evaluation  and  discussion. 
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Abstract: 


The  Workshop  brings  together  researchers  in  computational  vision  and 
psychophysics  to  discuss  ways  of  conc^tuaUzing  and  modeling  problems  in 
visual  p^eption.  Such  a  conceptualization  requires  common  fnuneworks  for 

formulating  problems  in  perception.  Workshop  participants  will  consider  what  - - 

formal  tools  and  structures  these  frameworks  should  provide  in  ordo:  to  be  most - 

useful  for  the  study  of  human  vision.  Sevoral  recently  proposed  frameworks  based  » 

on  the  formalism  of  Bayesian,  probabilistic  inference  will  serve  as  die  focal  point  O 
for  evaluation  and  discii^on.  ,  □ 
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MotivatioD: 


In  the  decade  since  Matt's  seminal  work  Vision:  A  camputaAonal  investigation  into  the  human 
representation  and  processing  of  visual  i^ormadon  ,  advances  in  conqtutational  vision  and 
p^hophysics  have  led  to  many  changes  in  the  way  we  conceptualize  and  study  problems  in 
visual  perception.  Despite  these  advances,  most  fruitful  interactions  between  computational  and 
psychophysical  work  have  been,  with  some  not^le  exceptions,  limited  to  the  study  of  low-level 
visual  mechanisms.  Much  less  common  are  stupes  which  integrate  computational  and 
psychophysical  approaches  to  problems  in  higher-level  visual  processing  such  as  object 
recognition,  shape  perception,  cue  integratiw  a^  cooperative  estimation  of  scene  properties. 
Given  the  progress  made  on  these  problems  in  conputational  vision  and  psychophysics  over  Ae 
last  decade,  the  time  is  ripe  to  b^g  researchers  in  tiiese  fields  together  to  ^scuss  ways  of 
conceptualizing  and  modeling  problems  in  vision  which  si^port  the  integration  of  Imowledge 
from  computational  and  psychophysical  studies.  A  prerequisite  for  such  integration  is  the 
existence  of  common  frameworks  for  formulating  problems  in  vision. 

R^ently,  a  number  of  groups  of  researchers  have  developed  formal  frameworks  for 
vision  built  on  tiie  pmciples  of  Bayesian,  probabilistic  reasoning.  Tiiese  were  developed  with  an 
eye  towa^  providing  a  common  language  and  set  of  formal  tools  for  specifying  both 
computationsd  theories  and  psychophysicidly  testable  hypotheses  about  higher-level  visual 
processing.  The  frameworks  will  form  the  focus  of  discussion  for  workshop  participants,  who 
will  include  computer  scientists,  mathematicians  and  psychophysicists.  Participants  will 
evaluate,  critique  and  discuss  extensions  and/or  altonatives  to  tiie  rcameworics.  The  computer 
scientists  and  mathematicians  in  the  group  will  provide  the  expertise  for  evaluating  the 
sufficiency  of  the  frameworks  for  formali^g  and  building  computational  theories.  The 
psychophysicists  will  provide  knowledge  of  the  perceptual  phenomena  with  wWch  the 
frameworks  must  be  test^  to  determine  their  usefulness  in  d^eloping  models  of  human  vision. 

Objective: 

We  hope  to  make  this  the  first  of  a  series  of  aimual  or  semi-aimual  workshops  focused  on 
combining  computational  and  psychophysical  approaches  to  visual  perception.  The  goal  of  this 
first  workshop  is  to  evaluate  the  prospects  for  general,  formal  frameworlu  which  integrate 
computational  and  psychophysical  approaches  to  perception.  Four  preliminary  frameworks 
which  have  l^en  proposed  will  serve  as  the  basis  for  discussions  in  the  workshop.  The 
discussions  will  center  on  a  number  of  critical  issues  revolving  around  the  usefulness  of  these 
frameworks  for  the  study  of  human  vision: 

•  What  are  the  computational  strengths  and  weaknesses  of  the  different  framewotics? 

•  What  domains  of  visual  processing  do  the  frameworks  provide  useful  languages  for  modeling? 

•  What  does  psychophysics  tell  us  about  what  is  ne^ed  ^  a  general  framework  of  vision? 

•  How  should  the  frameworks  be  extended  or  modifi«l  to  inoease  their  generally  of 
application? 

•  Do  the  frameworks  suggest  new  ways  of  asking  questions  for  psychophysicists? 

•  What  new  experiment^  paradigms  do  the  frameworks  suggest  for  psychophysics? 

^swers  to  these  questions  will  lead  to  new  ways  of  conceptualizing  computational  problems  in 
visual  perception  which  are  amenable  to  psychophysical  investigation. 
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Talk  Summaries 


How  to  Represent  Data  to  Facilitate  Association  Formation 

Horace  Bariow 
Cambridge  University 

It  is  generally  agreed  that  the  human  neocortex  is  wfaat  makes  us  unique,  but  it  is  fv  from 
clear  what  it  dora  to  us  or  how  it  does  it  C(mq)arative  anatomists  h^  beoi  saying  since 
Herrick's  time  that  it  seems  to  give  os  greater  knowledge  of  the  worid  around  os,  uid  it 
can  do  diis  in  two  ways.  First,  knowledge  of  the  worid  can  be  inherited,  as  numerous 
examples  c/  instii^tive  behavior  in  insects  and  odier  animals  proves.  I  don't  think  one 
should  underestimate  die  importance  of  diis  in  hi^er  animaU  and  humans:  diink  of  die 
special  skills  of  a  sheep  dog,  a  genius  Uke  Mozart  or  Einstein,  or  die  (^pressing  lack  of 
such  skills  in  diose  under-^owed  mentally.  But  even  in  these  cases,  what  varies  seems 
to  be  the  croacity  to  acquire  special  or  general  Imowledge,  and  this  requires  analysis  of 
the  noisy  information  provided  by  our  senses,  which  is  the  second  way  rrferred  to  above. 

Much  of  the  neocortex  is  devoted  to  representing  the  current  sensory  scene.  The 
thou^ts  above  suggest  that  it  may  provide  a  representation  specialized  to  facilitate  the 
acquisition  of  knowledge.  Following  Shannon,  the  stream  of  sensory  data  can  be  split 
cono^tually  into  iofomiation  and  redundancy,  in  spite  of  die  pejorative  term,  die  latter 
consists  of  all  the  structure,  regularity  and  rqietition  in  die  stream  of  messages,  and  it  is 
this  part  that  provides  us  with  knowledge  of  the  world.  The  information  b^  itself  would 
be  totally  irregular  and  unpredictable  and  would  in  fact  have  die  properties  of  random 
noise.  However,  the  separation  can  only  be  done  if  everydiing  about  the  structure  and 
regularity  contained  in  the  sensory  messages  is  known,  and  since  knowledge  is  never 
comiilete,  the  real  split  is  between  known  structure,  known  regularity  and  known 
repetition,  and  a  residue  that  contains  evidence  for  undiscovered,  new,  knowledge  as  well 
as  the  intrinsically  unpredictable,  apparendy  noisy  intomadon. 

Three  principles  for  a  representation  that  would  facilitate  the  acquisition  of  new 
knowledge  wm  be  discussed.  1)  Remove  evidence  for  the  associative  structures  you 
tdready  Imow  about;  one  way  of  doing  this  is  to  detect  conjunctions  of  activi^  in  the 
input  that  occur  more  often  than  they  woidd  if  randomly  associated  (i.e.  they  are 
"suspicious  coincidences”),  and  use  these  as  the  rqpresentative  elements  passed  on  to  the 
next  stage.  2)  Ensure  that  the  probabilities  of  occurrence,  as  well  as  the  actual 
occurrences,  of  the  representational  elements  are  available.  3)  Ensure  that  the 
representational  elements  occur  as  far  as  possible  independently  of  each  other  in  the 
environment  for  which  the  representation  is  to  be  used. 

Hnally,  I  shall  briefly  discuss  the  Yellow  Volkswagen  problem:  Can  a  system 
successfully  detect  associations  with  a  conjumxion  of  features  without  having  elements 
that  specifically  detect  that  conjunction? 


Lebesgiie  Logic  and  the  Bayesian  Foundations  of  Observer  Theory 

Bruce  M.  Bennett  and  Donald  D.  Hofihian 
University  of  California,  Irvine 

Perceptual  scientists  have  recently  enjoyed  success  in  consuucting  mathematical  theories 
for  specific  perceptual  capacities,  capacities  such  as  stereo-vision,  auditory  localization, 
and  color  perception.  Analysis  of  these  theories  suggests  that  they  all  share  a  common 
mathematical  structure.  If  this  is  true,  the  elucidation  of  this  structure,  the  study  of  its 


Our  conceptual  firanwwoik  fw  this  problem  consists  oi  three  psits:  a  kimwledge 
base,  a  state  q>ace  of  inteipretations  consistent  widi  the  knowledge  bam  for  die  particular 
ima^e,  and  elementary  praecence  relations.  The  prefigeny  relaitixxis  are  used  to  place  a 
partial  ordering  on  the  interpretations  in  the  state  space.  This  allows  us  to  define  a  percqit 
as  the  interpretations(s)  associated  with  maximal  nodes  in  the  mdering. 

Although  conceptually  simple,  this  defimticm  of  a  percrat  raises  many  difficult 
issues.  For  example,  what  Idi^  of  knowled^  rqaesentations  wm  simport  the  reasoning 
process  required  to  find  consistent  interpretations  and  nodes?  (We  entertain  one 

such  proposal  1^  Feldman.)  Do  we  need  to  revise  prefaence  relations  for  each  image  or 
context?  (Surprisingly,  little  revision  may  be  neec^)  How  are  priors  treated?  (We  use 
only  binary  weights.)  How  are  maximal  nodes  sought  out  and  identified?  When  multiple 
nodes  are  creau^  in  the  partial  ordering,  how  then  is  one  percrat  chosen  over  another? 
(Here  we  show  how  the  initial  percept  may  lead  to  a  reviaon  of  die  current  ordering  of 
foe  consistent  interpretations.)  When  does  it  nudce  sense  to  re-examine  foe  image  data  and 
knowledge  base  to  determine  whether  ttoe  is  a  ”b^ter”  interpretation  that  emlains  more 
of  foe  data?  And  finally,  when  should  this  process  be  brouj^t  to  a  halt?  (these  latter 
issues  relate  to  foe  coherence  of  a  percept)  For  all  of  the  above,  we  stress  issues  of 
competence,  not  performance. 


Ideal  observers  and  ideal  worlds:  A  Bayesian  view  of  visual  information 

processing 

David  Knill  and  Daniel  Kersten 
University  of  Minnesota 

In  studying  foe  problem  of  visual  perception,  it  is  necessary  to  decompose  foe  general 
problem  into  small,  manageable  pieces.  How  we  break  up  foe  problem  and  foe  language 
we  use  to  characterize  and  solve  sub-problems  determine  how  well  we  can  re-integrate 
these  partial  solutions  into  a  general  mo^l  of  human  perception.  We  argue  that  for 
problems  in  mid  and  high  level  vision  (e.g.  sh2q>e  perception  and  object  recognition),  foe 
most  promising  level  at  which  to  formulate  problems  and  models  is  foe  computational 
level,  and  that  foe  proper  framework  for  this  formulation  is  a  Bayesian  one.  In  this  talk, 
we  describe  a  particular  "Bayesian"  view  of  visual  information  processing  based  on  foe 
twin  metaphors  of  ideal  observers  (a  form  of  competence  thec^)  and  idetd  worlds  (a 
form  of  pc^ormance  theory).  The  framework  provides  a  means  for  a  "strong"  integration 
of  computation  and  psychophysics  in  buildmg  models  of  perception  by  providing  a 
common  language  for  formulating  models  of  competence  and  performance. 

Both  ideal  observers  and  ideal  worlds  are  characterized  by  posterior  distributions, 
p(SII),  specifying  foe  probabilify  density  function  for  possible  interpretations  of  a  set  of 
scene  properties  S,  conditional  on  foe  image  data  L  An  ideal  observer  does  the  best 
possible  job  of  estimating  S  from  I  in  our  environment  B  consists  of  five  con^nents:  A 
q)ecification  of  what  scene  properties  are  being  estimated  (perceived),  a  specmcation  of 
foe  data  for  foe  estimation,  a  criterion  for  foe  estimation  (e.g.  MAP),  a  likelihood  fimction 
(a  model  of  image  formation  and  image  uncertainty)  and  a  model  of  foe  prior  probability 
densi^  function  for  elements  in  foe  space  of  possible  interpretations  of  S.  An  ideal  world 
consists  of  foe  same  five  components  with  foe  exception  t^  foe  lilrelihood  function  and 
foe  prior  model  are  internalize  in  a  human  observes  visual  system.  An  ideal  world  can 
be  viewed  as  a  description  of  foe  world  in  wluch  a  given  human  observer  would  be  foe 
ideal  observer.  Ideal  observers  and  worlds  for  specific  domains  and  problems  can  be 
incorporated  into  more  general  ideals  by  noting  that  foe  constituent  elements  of  foe 


Instead  we  propose  strong  coupling  in  which  attention  is  paid  to  the  degree  of 
dqpeodence  between  the  lUc^ood  fiinraons  and  prior  assumptions  of  two  sources. 

Computer  vision  theories  tend  to  use  generic  prior  assun^tions  that  are 
supposedly  valid  fm  a  large  variety  of  scenes.  We  suggest  mstead  that  it  is  preferable  to 
use  a  set  of  competing  qm^c  prior  assumptimis  gean^  towards  the  tasks  die  visual 

system  is  intend  to  peifonn.  We  argue  drat  this  concept  of  ampetitive  priors  is 
^irable  on  theoretical  grounds  and  is  suiqxvted  by  experimental  evide^. 


Ideal  observers  and  Psychophysics:  Shape  hrom  Texture 

Heinrich  H.  Bulthofift,  Andrew  Blake*  and  David  Sheinbergt 
tBrown  University 
*Oxfotd  University 

We  describe  an  ideal  observer  model  for  estimating  ''Shape  from  Texture"  which  is 
derived  from  Ae  principles  of  statistical  information.  For  a  given  family  of  surface 
shapes,  measures  of  statistical  information  can  be  computed  for  two  different  texture  cues 
-  density  and  orientation  of  texels.  These  measures  can  be  used  to  predict  lower  bounds 
on  the  variance  of  shape  judgements  of  "ideal"  and  human  obsovers.  They  can  also 
predict  the  optimal  weights  for  cue  integration  for  sh^  from  texture.  These  weights  are 
directly  proportional  to  the  information  carried  by  each  cue.  The  ideal  observer  model 
would  thus  predict  that  the  variance  of  sub^ts'  responses  in  a  psychophysical  shape 
adjustment  task  should  reflect  the  statistical  in^rtance  of  individu^  texture  cues.  Our 
results  show  that  human  performance  in  shape  judgements  for  a  one-parameter  family  of 
parabolic  cylinders  is  often  better  than  the  ideal  observer  using  only  a  density  cue. 
Therefore  other  information,  for  example  die  compression  cue,  must  be  used  by  human 
observers.  For  the  first  time,  such  results  have  b^n  obtained  without  recourse  to  the 
uimatural  cue  conflict  paradigms  used  in  previous  experiments.  The  model  makes  fi^er 
predictions  for  die  perception  of  planar  slanted  surfaces  in  the  case  of  wide  field  of  view. 


Mid'level  Vision  in  Scene  Understanding 

Edward  Adelson  and  Alex  Pentland 
Massachusetts  Institute  of  Technology 

Mid  level  vision  can  use  probabilistic  constraints  derived  from  the  physical  structure  of 
the  world  in  order  to  bridge  gap  between  low-level  primitives  and  higher-l^el  scene 
descriptions.  For  instance,  consider  the  world  of  "painted  polyhedra,”  in  which  scenes 
project  to  images  consisting  of  grey  polygonal  patches.  Edges  can  be  caused  by  changes 
m  reflectance  or  changes  in  illumination  (e.g.,  due  to  changes  in  surface  normal).  It  is 
generally  possible  to  explain  a  given  image  purely  in  terms  of  reflectance  or  purely  in 
terms  of  iUumination  or  by  combinations  thereof;  a  vision  system  must  search  for  the 
"best"  or  "most  likely"  interpretation. 

Local  strategies,  such  as  junction  analysis,  restrict  the  search  but  are  not 
sufficient  We  find  that  a  global  search  process,  involving  3-D  shape  recovery,  junction 
analysis,  and  lighting  analysis,  is  requir^  to  derive  a  stable,  consistent  interpretation  of 
the  scene  edges  and  patches. 


Koeuleiiiik  has  proposed  geometric  methods  for  detetmiiiinp  the  topologically 
distinct  views  of  an  object  Starting  widi  a  3D  model,  this  deconqmsition,  idbrrcd  to  as 
an  aspect  p^)h,  provides  a  complete  reiffesentation  of  eve^  unujue  view  of  an  object 
More  specifically,  the  roace  of  viewpoints  can  be  partitioned  mto  maximal  regions 
wherein  the  .^  aiicture  of  die  line  drawing  defined  image  intensity  discontinuities 
(edges)  if  Identical;  the  regions  are  delineated  by  visual  events  where  the  structure 
changes.  The  structure  (topology)  of  the  line  drawing  is  defined  by  the  relationship  of 
feature  points  such  as  T-junctions,  vertices,  cootour  terminadons  (ct^).  Sections,  etc. 
and  the  smooth  contours  connecting  diem.  Thus,  the  objMts  appearance  is  qualitatively 
similar  for  all  orientations  within  a  region;  a  qualitative  change  occurs  when  the 
orientation  crosses  a  visual  event  boundary.  Inmrtantly,  results  ffom  ringularity  and 
catastrophe  theory  indicate  that  diere  is  a  relatively  gn«n  catalogue  of  visual  events  and 
consequently  only  a  small  number  of  ways  diat  the  image  structure  can  change. 

If  humans  do  use  multiple-views  representatimis  (even  if  such  mechanisms  are 
used  only  for  particular  ta^),  dien  a  pr^plnl,  geometric  decomposition  of  the  view 
space  of  objects  is  necessary  for  orgamzing  viewer-centered  information.  Furthermore, 
broause  representations  of  objects  are  not  specified  a  priori  in  human  vision,  we  must 
learn  them  as  we  explore  our  environment  -  presumably  using  image  features  similar  to 
those  specified  by  computational  theory.  Consequently,  formal  (Ascriptions  of  object 
geometry,  including  but  not  limited  to  current  aspect  ^raph  methods,  offer  the 
experimental  psychologist  a  principled  means  for  bodi  manqrulating  the  orientation  of 
objects  across  surface  geometry  and  analyzing  human  recognition  performance  and 
perceptual  behavior. 

While  knowledge  of  the  image  features  that  define  visual  events  is  helpful  in 
understanding  object  structure,  it  is  insufficient  for  utiluang  aspect  graph  models  to  study 
human  shape  representation.  One  must  also  have  the  means  for  decomposing  actual 
objects  into  their  characteristic  views.  This  requirenrat  has  presented  a  major  ob^le  in 
employing  such  models  in  behavioral  studies.  Crucially,  new  results  have  demonstrated 
that  Koenderink's  theory  is  coinputationally  tractable,  and  it  has  since  enjoyed  increasing 
popularity  in  the  computer-vision  community.  Even  stiU,  the  majon^  of  work  has 
focused  on  polyhedral  objects;  only  recently  have  there  bera  techniques  for  computing 
the  coiiq)lete  aspect  gr2q)hs  of  a  variety  of  objects  based  on  a  combination  of  catastrophe 
theory,  rigebraic  geometry,  and  robust  numencal  methods. 

In  order  to  assess  the  validity  of  this  framework,  we  have  initiated  several  studies 
to  capitalize  on  these  coinputational  mediods  in  psychophysical  studies.  We  ha'  e  be^ 
by  conducting  a  series  of  experiments  to  investigate  wheth^  humans  are  indeed  sensitive 
to  the  features  used  in  determining  the  topologically  distinct  views  of  an  aspect  graph. 
The  subjects'  task  was  to  j^ge  whether  two  consecutively  presented  images  of  the  same 
smoothly  curved  object  (rendered  with  occluding  contours  or  shaded)  were  displayed  at 
the  same  or  at  different  orientations  (generated  by  rotations  in  depth).  Performance  was 
assessed  by  measuring  their  accuracy  in  detecting  an  orientation  (fffierence  between  two 
images.  As  accuracy  increases  subjects  are  (iemonstrating  an  increased  ability  for 
discriminating  a  change  in  view.  When  one  cc^ares  the  locations  of  the  visual  events  as 
predicted  by  the  conq)utational  theory  -  that  is,  the  orientations  where  the  asp^t  graph 
makes  the  transition  ^m  one  view  to  another  -  to  the  percent  correct  function,  it  is  clear 
that  accuracy  in  discTiminating  orientations  does  increase  when  images  cross  a  visual 
event.  In  general,  adjacent  image  pairs  separated  by  visual  events  exhibited  large 
increases  in  sensitivity. 


Finally,  we  describe  some  onpiiUished  woik  describing  a  toy  worid  of 
siBieosctmc  figmes,  ediere  mily  2  or  3  dsparities  are  defined  and  where  a  wealdi  of 
de|Mh  nmkings  can  be  discenied.  Yet,  eadi  perceived  ctatfiguration  shows  the  existence 
of  mutual  crastntnts,  not  dissiniilar  to  thore  originally  suggested  in  congtuter  algoridiin^ 
to  inteipret  scenes  fitm  line  drawings. 
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8:00  (after  dinner)  D.  Mumford  (HarvanO 
'Tercq)tion  via  pattern  theory" 


Perception  via  Pattern  Theory 

David  Mumford 
Harvard  UnivessiQr 

Grenander's  ideas  from  numy  years  ago  seem  to  be  taking  on  very  concrete  forms  in 
recent  work  in  conmuter  vision  and  seem  to  be  working.  I  would  like  to  try  to  pull 
together  his  vision  or  the  foundations  of  pacq>tion  and  contrast  it  widi  odier  approaches 
(e.g.  Poggio's,  Ullman's).  I  want  to  mention  some  extensions  ci  his  ideas:  to  cogmtive 
thinking  in  ^neral,  to  learning  via  miniimun  descr^on  length  and  to  neural  algorithms 
which  may  implement  the  theory. 
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